The Mark-and-Sweep algorithm is a fundamental tracing garbage collection technique that operates in two distinct phases: first marking all reachable objects starting from root references, then sweeping the heap to reclaim memory from unmarked objects.
The Mark-and-Sweep algorithm solves the problem of automatic memory management by identifying which objects are still in use and which are garbage. It's called a 'tracing' garbage collector because it traces through the object graph starting from known roots . Unlike reference counting, it can handle circular references without getting stuck in infinite loops . The algorithm fundamentally requires two operations: detecting all unreachable objects and reclaiming the heap space they occupy .
Starting Point - The Roots: The algorithm begins from a set of known root objects—global variables, local variables on the stack, and CPU registers that can reference heap objects . In browsers, roots include the window object; in Node.js, the global object .
Graph Traversal: From each root, the algorithm traverses all references, following pointers to other objects recursively . This is typically implemented using depth-first search, visiting every reachable node in the object graph .
Marking Mechanism: Each object has a mark bit (initially set to 0/false). When visited, this bit is set to 1/true to indicate the object is reachable and should be preserved . This marking may use a separate bit table or a bit within the object itself .
Handling Cycles: Because the algorithm traces reachability from roots rather than counting references, it correctly identifies cycles as garbage if no external references point to them .
Heap Scan: After marking completes, the algorithm scans through the entire heap, examining every object in order of increasing address . This is a linear pass through memory that examines all allocated objects .
Reclamation Decision: Objects with their mark bit still set to 0 (false) are considered unreachable garbage. Their memory is freed and returned to the system or made available for new allocations .
Mark Reset: For objects that were marked (reachable), their mark bit is cleared back to 0 in preparation for the next garbage collection cycle . This ensures each collection starts with a clean state.
Free List Management: Reclaimed memory is typically added to a free list, which tracks available blocks for future allocations .
The Mark-and-Sweep algorithm has several important advantages. It handles cyclic references automatically, which is impossible for pure reference counting collectors . It adds no overhead during normal program execution—all work happens during collection pauses . However, it also has significant disadvantages. The most critical is that normal program execution must be suspended ('stop-the-world') while collection runs, which can cause noticeable pauses in interactive applications . Additionally, sweeping time is proportional to the total heap size, not just the amount of live data .
Memory Fragmentation: Over time, as objects are freed in arbitrary order, the heap becomes fragmented with small unused gaps between live objects . This can prevent allocation of large objects even when sufficient total free memory exists.
Compaction Solution: Many modern collectors add a compaction phase after sweeping, moving live objects together to create one contiguous free block . This eliminates fragmentation but adds overhead.
Incremental Marking: To reduce pause times, engines like V8 break marking into small steps interleaved with program execution, using write barriers to track references created during marking .
Concurrent Sweeping: Some engines perform sweeping on a background thread while the program continues running, reducing perceived pause times .
In modern JavaScript engines, pure Mark-and-Sweep is rarely used alone. Instead, it's combined with generational collection (young/old generations) to optimize performance . V8's Orinoco collector, for example, uses Mark-and-Sweep for the old generation but employs a copying collector for the young generation . The fundamental insight remains: by tracing reachability from roots, the algorithm correctly identifies garbage regardless of reference patterns, making it the foundation of virtually all modern production garbage collectors.